rank | frequency | n-gram |
---|---|---|
1 | 25765 | -o |
2 | 24236 | -n |
3 | 17605 | -j |
4 | 17490 | -s |
5 | 16149 | -a |
rank | frequency | n-gram |
---|---|---|
1 | 11025 | -on |
2 | 10383 | -oj |
3 | 7773 | -is |
4 | 6995 | -aj |
5 | 6872 | -jn |
rank | frequency | n-gram |
---|---|---|
1 | 4603 | -ojn |
2 | 2827 | -taj |
3 | 2412 | -nte |
4 | 2411 | -toj |
5 | 2223 | -ajn |
rank | frequency | n-gram |
---|---|---|
1 | 1601 | -ante |
2 | 1101 | -itaj |
3 | 968 | -anta |
4 | 926 | -tojn |
5 | 886 | -ntaj |
rank | frequency | n-gram |
---|---|---|
1 | 521 | -antoj |
2 | 515 | -igxis |
3 | 510 | -antaj |
4 | 502 | -istoj |
5 | 378 | -igita |
The tables show the most frequent letter-N-grams at the ending of words for N=1…5. Everything runs in parallel to 2.2.5 Most frequent word beginnings. The aim is suffix detection instead of affix detection.
For N=3:
SELECT @pos:=(@pos+1), xx.* from (SELECT @pos:=0) r, (select count(*) as cnt ,concat("-", right(word,3)) FROM words WHERE w_id>100 group by right(word,3) order by cnt desc) xx limit 5;
2.2.5 Most frequent word beginnings